Empirical comparisons of various discretizationprocedures
نویسندگان
چکیده
The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or nance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML eld. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The rst one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization ts the way how KEX creates a knowledge base. Nevertheless, the resulting categorization is suitable also for other machine learning algorithms. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is devided into intervals that may form a complex generated by the algorithm as a part fo the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.
منابع مشابه
Parametric Empirical Bayes Test and Its Application to Selection of Wavelet Threshold
In this article, we propose a new method for selecting level dependent threshold in wavelet shrinkage using the empirical Bayes framework. We employ both Bayesian and frequentist testing hypothesis instead of point estimation method. The best test yields the best prior and hence the more appropriate wavelet thresholds. The standard model functions are used to illustrate the performance of the p...
متن کاملStochastic Comparisons of Probability Distribution Functions with Experimental Data in a Liquid-Liquid Extraction Column for Determination of Drop Size Distributions
The droplet size distribution in the column is usually represented as the average volume to surface area, known as the Sauter mean drop diameter. It is a key variable in the extraction column design. A study of the drop size distribution and Sauter-mean drop diameter for a liquid-liquid extraction column has been presented for a range of operating conditions and three different liquid-liquid sy...
متن کاملComparison of Model Selection for Regression
We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the best method for finite-sample estimation problems, even for the simple case of linear estimators. This article presents empirical comparisons between classical statistical methods - Akaike information criterion (AIC) and Bayesian information criterion (BIC) - and the structural ris...
متن کاملUnder Review in Neural Computation , 2002 Comparison of Model Selection for Regression
We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the ‘best’ method for finite-sample estimation problems, even for the simple case of linear estimators. This paper presents empirical comparisons between classical statistical methods (AIC, BIC) and the SRM method (based on VC-theory) for regression problems. Our study is motivated by ...
متن کاملTHE EMPIRICAL BAYES METHOD OF ANALYSIS OF A SERIES OF EXPERIMENTS
The classical method of analysis of a series of experiments is somewhat involved in being conditional on various, occasionally unrealistic, assumptions such as homogeneity of variances of experimental error, lack of interactions of treatments and places,etc. In this work, we adopt a Bayesian view to account for such heterogeneities. Our appoach is illustrated by a real series of experiment...
متن کامل